Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition

نویسندگان

Basil Abraham

Srinivasan Umesh

Neethu Mariam Joy

چکیده

Articulatory features provide robustness to speaker and environment variability by incorporating speech production knowledge. Pseudo articulatory features are a way of extracting articulatory features using articulatory classifiers trained from speech data. One of the major problems faced in building articulatory classifiers is the requirement of speech data aligned in terms of articulatory feature values at frame level. Manually aligning data at frame level is a tedious task and alignments obtained from the phone alignments using phone-to-articulatory feature mapping are prone to errors. In this paper, a technique using connectionist temporal classification (CTC) criterion to train an articulatory classifier using bidirectional long short-term memory (BLSTM) recurrent neural network (RNN) is proposed. The CTC criterion eliminates the need for forced frame level alignments. Articulatory classifiers were also built using different neural network architectures like deep neural networks (DNN), convolutional neural network (CNN) and BLSTM with frame level alignments and were compared to the proposed approach of using CTC. Among the different architectures, articulatory features extracted using articulatory classifiers built with BLSTM gave better recognition performance. Further, the proposed approach of BLSTM with CTC gave the best overall performance on both SVitchboard (6 hours) and Switchboard 33 hours data set.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EARS: Electromyographical Automatic Recognition of Speech

In this paper, we present our research on automatic speech recognition of surface electromyographic signals that are generated by the human articulatory muscles. With parallel recorded audible speech and electromyographic signals, experiments are conducted to show the anticipatory behavior of electromyographic signals with respect to speech signals. Additionally, we demonstrate how to develop p...

متن کامل

Automatic Speech Recognition Based on Electromyographic Biosignals

This paper presents our studies of automatic speech recognition based on electromyographic biosignals captured from the articulatory muscles in the face using surface electrodes. We develop a phone-based speech recognizer and describe how the performance of this recognizer improves by carefully designing and tailoring the extraction of relevant speech feature toward electromyographic signals. O...

متن کامل

Backing-off Context- & Gender-dependent Models for Better Articulatory Feature Extraction

The majority of speech recognition systems today commonly use Hidden Markov Models (HMMs) as acoustic models in systems since they can powerfully train and map a speech utterance into a sequence of units. Such systems perform even better if the units employed are context-dependent and gender-dependent. Analogously, when HMM technology is applied to the problem of articulatory feature extraction...

متن کامل

Articulatory Feature Classification Using Nearest Neighbors

Recognizing aspects of articulation from audio recordings of speech is an important problem, either as an end in itself or as part of an articulatory approach to automatic speech recognition. In this paper we study the frame-level classification of a set of articulatory features (AFs) inspired by the vocal tract variables of articulatory phonology. We compare k nearest neighbor (k-NN) classifie...

متن کامل

An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language

A novel framework for automatic articulatory-acoustic feature extraction has been developed for enhancing the accuracy of placeand manner-of-articulation classification in spoken language. The ‘‘elitist’’ approach provides a principled means of selecting frames for which multi-layer perceptron, neural-network classifiers are highly confident. Using this method it is possible to achieve a frame-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition

نویسندگان

چکیده

منابع مشابه

EARS: Electromyographical Automatic Recognition of Speech

Automatic Speech Recognition Based on Electromyographic Biosignals

Backing-off Context- & Gender-dependent Models for Better Articulatory Feature Extraction

Articulatory Feature Classification Using Nearest Neighbors

An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language

عنوان ژورنال:

اشتراک گذاری